\section{Introduction}
\label{sec-introduction}

Function calls in a dynamic language like \commonlisp{} can be
significantly more expensive in terms of processor cycles than
function calls in a typical static language.  There are several
reasons for this additional cost:

\begin{enumerate}
\item With \emph{late binding} being a requirement, i.e., the fact that
  functions can be removed or redefined at run-time, and that callers
  must take such updates into account, it is necessary to have some
  indirection that can be modified at run-time.  Mechanisms such as
  compiler macros and inlining break this requirement, which is often
  a serious drawback to their use.
\item \commonlisp{} has a rich function-call protocol with optional
  parameters and keyword parameters.  Keyword parameters, in
  particular, require some considerable run-time parsing for every
  call to a function that has such parameters.
\item In general, a function that can honor its contract only for
  certain types of its arguments must check such types for each call.
\item All objects must be \emph{boxed} in order to be used as function
  arguments.  For example, IEEE double-float values will typically
  have to be allocated on the heap, though so-called NaN-boxing
  \cite{Gudeman93representingtype} can eliminate that particular case.
  Full-word integers still require boxing, however.  Similarly, boxing
  is required for values returned by a function.
\item Generic functions can be dynamically updated by the addition or
  removal of methods.  Thus, even when the callee is a known generic
  function, callers can make no assumptions about which methods might
  be applicable.
\item The fact that a function can return multiple values requires the
  callee to return additional information about the number of
  return values, and callers that accept multiple values must retrieve
  this information in order to access the return values, and use
  default values when it expects more values than the callee
  returned.
\end{enumerate}

In a typical \commonlisp{} implementation, item number~1 is handled by
an indirection in the form of a slot in the symbol naming the
function, requiring a memory access.  On modern processors a memory
indirect branch is more costly than a direct branch.  Even if the
branch-prediction logic of the processor is able to make the right
decision in the indirect case, there is at least the additional cost
of accessing the cache.

Item number 2 can be mitigated by the use of compiler macros.
Essentially, the creator of a function with a non-trivial lambda list
can also create special versions of this function for various argument
lists.  A call with an argument list that is recognized by the
compiler macro can then be replaced by a call to such a special
version, presumably with a simpler lambda list.

Item number 3 can be handled by inlining, allowing the compiler to
take advantage of type inference and type declarations to determine
that some type checks can be elided.  However, inlining has the
disadvantage that a redefinition of the callee will not automatically
be taken into account, thereby requiring the caller to be recompiled for
the redefinition to be effective.

The problem indicated by item number 4 can be largely eliminated by
the use of more than one entry point for functions, one of which would
accept unboxed arguments.  This technique is used by Allegro Common
Lisp%
\footnote{https://franz.com/products/allegro-common-lisp/} which also
allows for functions to return a single unboxed value.

Concerning item number 5, the main difference between function
redefinition and generic-function updates is that a generic function
consists of independent \emph{effective methods}, only one of which is
applicable for a particular call.  To determine which effective method
is applicable, in the general case some significant \emph{generic
  dispatch}, based on the class or the identity of arguments, may be
required.

A common technique for handling item number 6 is to recognize that
most callers will use only a single return value.  Then, if the callee
returns no values, the register holding the first return value will
nevertheless be initialized to \texttt{nil}.  As a result, callers
that use a single return value never have to test the number of
values actually returned.

In this paper, we propose a very general technique for call-site
optimization that can handle many of the issues listed at the
beginning of this section.  We plan to incorporate this technique in
the \sicl{}%
\footnote{https://github.com/robert-strandh/SICL}
implementation of the \commonlisp{} language.

%%  LocalWords:  callee
